# documentation
library(knitr)
# wrangling
library(tidyr)
library(dplyr)
library(tibble)
library(readr)
library(glue)
# visualization
library(ggplot2)
library(ggthemes)
library(plotly)
library(patchwork)13 Climate change: Drivers
Learning Objectives
After completing this tutorial you should be able to
- analyze earth’s global temperatures to determine if they are increasing
- analyze CO2 data to determine if atmospheric levels are increasing
- compare rates of increase of CO2 and global temperature
- compare current trends of with rates of change during pre-historic periods using ice core data
- interpret results to understand current climate change
Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.
There should be a file named 13_greenhouse-gases.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.
1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.
Let’s make sure to read in the libraries we need for this analysis
13.1 Unequivocal and unprecedented?
Current climate change is concerning not (only) because of absolute warmth of the earth but due to rate at which it is occurring. What this means is that over the history of our planet there have been times at which temperatures have been higher that what we are currently experiencing. The component that is “unprecedented” is the rate of increase and impact that is having on our planet2
2 We will look at examples of that impact using various data set describing change in the earth-climate system.
In this chapter, we are going to dive into a series of data sets that will allow us to explore whether we do indeed have evidence that the climate change we are currently observing is indeed unequivocal, driven by anthropogenic effects, and unprecedented in its rate.
We will also need to consider limitations of our approach. In this case, we will need to assess whether our approach is investigating a causal or mechanistic effect in and of itself or if we are uncovering evidence consistent with a known mechanism/process.
13.2 Current rates of air temperature change
Let’s start by taking a look at changes in global mean air temperatures.
3 You can do this on a piece of paper and snap a picture or sketch directly on a computer/mobile device in an appropriate app. Save your picture as scratch/glob-temp-patterns.jpg and you can use the markdown code below to import it. If you end up with a different filename or file format, adjust your code accordingly.
The air temperature data we will be using is compiled by the Goddard Institute for Space Studies (NASA) and can be accessed in on their webpage which also describes how their data set was compiled and processed.
Remember to correctly specify the delim argument for a *.csv file.
# read csv file
temperature <- read_delim("data/GLB.Ts+dSST.csv", delim = ",")Before we move on, let’s check out our temperature object in the Global Environment to see if the file read in okay - spoiler alert, it did not4.
4 You should create a habit of always checking that your data has read in as expected, immediately determining that something is wrong and correcting it will minimize issues with troubleshooting down the line.
Troubleshooting Skills: Your file didn’t read in correctly. Let’s figure out why.
A good starting point is to open the file in a text editor to see if we missed anything. Rstudio now has a built in text editor. Use the file navigation pane to navigate to the data folder. Clicking on the temperature file will create a pop up, select View File and it will pop up in the View pane in a separate tab next to your quarto pane5.
Sure enough there seems to be an additional line at the beginning which is probably causing the issue. One way to fix this is to simply delete the extra line … before you do this, remember our first module looking at data wrangling and the cardinal rules we set in place? One key principle is “DO NOT EDIT YOUR RAW FILES”: if we want to have a reproducible workflow we should avoid manually editing our data sets.
Instead use ?read_delim to pull up the help file for the function. You should find an argument called skip which will let us tell the function how many extra lines there are at the beginning of the file.
# read csv file skipping the first line
temperature <- read_delim("data/GLB.Ts+dSST.csv", delim = ",", skip = 1)
# check that dataframe is in order
head(temperature)# A tibble: 6 × 19
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 1880 -0.18 -0.24 -0.09 -0.16 -0.1 -0.21 -0.18 -0.1 -.15 -.23 -.22 -.17
2 1881 -0.19 -0.14 0.03 0.05 0.06 -0.18 0 -0.04 -.15 -.22 -.18 -.07
3 1882 0.16 0.14 0.04 -0.17 -0.15 -0.23 -0.16 -0.07 -.14 -.24 -.17 -.36
4 1883 -0.29 -0.37 -0.12 -0.18 -0.17 -0.08 -0.06 -0.14 -.21 -.11 -.23 -.11
5 1884 -0.13 -0.08 -0.36 -0.4 -0.34 -0.36 -0.3 -0.27 -.27 -.25 -.33 -.31
6 1885 -0.58 -0.34 -0.27 -0.42 -0.45 -0.43 -0.34 -0.31 -.29 -.24 -.24 -.11
# ℹ 6 more variables: `J-D` <chr>, `D-N` <chr>, DJF <chr>, MAM <dbl>,
# JJA <dbl>, SON <chr>
That seems to have done the trick! Put that in your bag of tricks for future reference.
5 Be really careful not to accidentally edit the raw data file!
Now that that is resolve, let’s take a slightly more detailed look at our data set to make sure there aren’t any additional changes we need to make. For example we need to determine if there are NA values and if they are properly formatted and we need to make sure all the columns are numeric.
NA values have been specified as ***, which has caused some columns to be formatted as character instead of numeric.
We can use replace() to search every column (we specify this using . instead of a specific column name) and mutate_if() which tells R to check every column and if it is a character data type (is.character) to convert it to numeric (as.numeric).
temperature <- temperature %>%
replace(. == "***", NA) %>%
mutate_if(is.character, as.numeric)We are going to use ggplot2 which is part of the tidyverse to plot our figure. This will introduce you to some of the standard syntax. Fear not, we will work through the details of the framework on which ggplot2 relies in the next chapter so think of this as a sneak peak to get used to the syntax.
ggplot(temperature, aes(x = Year, y = `J-D`)) + # define data set and columns to plot on x and y axis
geom_line(color = "blue", size = 1) + # plot line plot
labs(x = "year", y = "temperature anomaly [C]", # determine labels
title = "Annual mean global surface temperature relative to 1951-1980 average.",
caption = "source: NASA Goddard Institute for Space Studies")Here’s what your figure should look like based on that code.
This figure gives us a qualitative view of changing global temperatures.
For a quantitative assessment we would want to determine the rate of change. For this case we would define the rate of change as the change in temperature divided by the change in time for a certain time period. A more general definition would be that you are calculating the slope of the line you have fit through the data as the change-in-y divided by the change-in-x6.
6 If you compare the two figure you should see that fitting a linear regression is an oversimplification but it will allow us to make a quantitative comparison
We can visualize this by adding a layer to our figure using geom_smooth() and setting the method to lm (linear regression).
ggplot(temperature, aes(x = Year, y = as.numeric(`J-D`))) +
geom_line(color = "blue", size = 1) +
geom_smooth(method = "lm", color = "red", size = 1, se = FALSE) +
labs(x = "year", y = "temperature anomaly [C]",
title = "Annual mean global surface temperature relative to 1951-1980 average.",
caption = "source: NASA Goddard Institute for Space Studies")Here is what your figure will look like with that added layer.
That’s helpful in terms of a visualization but for a to really quantitatively assess the rate of change, we need to fit a linear regression as y = mx + b. With m as the slope and b as the intercept to determine the rate of change; with that equation we can then determine the rate of change by extracting m. The larger the slope m, the steeper the fitted line and the more rapid the change in temperature.
We can fit the linear regression using the function lm().
# fit linear regression
score_model <- lm(`J-D` ~ Year, data = temperature)
# view summary of results
summary(score_model)
Call:
lm(formula = `J-D` ~ Year, data = temperature)
Residuals:
Min 1Q Median 3Q Max
-0.36171 -0.13297 -0.02646 0.12807 0.45284
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.515e+01 7.106e-01 -21.32 <2e-16 ***
Year 7.797e-03 3.641e-04 21.41 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1798 on 141 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.7648, Adjusted R-squared: 0.7631
F-statistic: 458.5 on 1 and 141 DF, p-value: < 2.2e-16
We are interested in the equation of the regression line, how well the data fits the line, and whether or not the regression is significant. We can use the output of the linear regression to determine that. The estimate columns shows the values for the intercept b and the slope m for the variable (in this case year).
13.3 Rate of change of atmospheric CO2
Next, let’s determine the rate of change of atmospheric CO2.
Dr. Charles David Keeling (1928 - 2005) began collecting atmospheric CO2 concentration data at the Mauna Observatory (Hawaii); this data set comprises the longest measurement of atmospheric CO2 concentrations7. This data set has been fundamental to our understand the role of human activities such as fossil fuel burning in driving climate change.
7 Longest measurement using instruments - we will see later that we can use proxy data to indirectly measure CO2 levels for much longer time periods
We can access the data set directly from the Global Monitoring Laboratory. Select the Data tab, then download the csv data set Mauna Loa CO2 annual mean data.
Once you have downloaded the data set, make sure to move it to the data folder of your project directory, then read it into Rstudio using the follow code8.
8 If you look at the raw data using a text editor you will quickly see why we need to include the skip = 55 argument
CO2 <- read_delim("data/co2_annmean_mlo.csv", delim = ",", skip = 55)
head(CO2)# A tibble: 6 × 3
year mean unc
<dbl> <dbl> <dbl>
1 1959 316. 0.12
2 1960 317. 0.12
3 1961 318. 0.12
4 1962 318. 0.12
5 1963 319. 0.12
6 1964 320. 0.12
13.4 Comparison of current and pre-historic rates of change
The most recent IPCC assessment has labeled the increase in temperatures driving contemporary climate change as “unprecedented”. However, temperatures have changed in earth’s past, and temperatures have even been higher than what we are experiencing now. What is unprecedented is the rate at which this is occurring, at least according to the IPCC.
Shall we investigate?
To do this we will need to look at past climate change. The two data sets we just took a look at are measurements of temperature and CO2 using instrumentation, i.e. we have directly measured values for the parameters we are interested in at different points in time. Dr. Keeling was one of the first scientists to consider the importance of long-term monitoring sites. The rapid changes taking place in our environment have created a push to generate long-term data sets with a focus on making the accessible. We will take a look at some of these data sets later in this semester and you will likely use at least one of them for your own data science project.
However, we frequently have questions that might extend beyond data sets like the two we used above.
How can we access data from before we had instrumentation? One way to do this we have to use so-called proxy data sets.
For a deep dive on proxy data sets and climate archives, you can check out a detailed interactive brief on proxy data.
For our assessment we are going to explore proxy data derived from ice cores which next to ocean sediments give us some of the longest records of past climate conditions.
Hundreds of ice cores extracted from polar ice have proven valuable to understanding changes in atmospheric chemistry over pre-historic time. Here, we can make use of the fact that as the ice is formed, air bubbles are trapped. Because these air bubbles have remained frozen, they still have the same composition of gases as at the time they were trapped. The depth of an ice core is correlated to time, deeper ice is older. In other words, ice cores form an archive of atmospheric conditions over time. We can directly measure CO2 from the air bubbles trapped in the ice and we can measure isotopic ratios of oxygen in the water molecules of the core to derive temperature.
Vostok Ice core data set has been constructed using ice cores from the Vostock research station in the Antarctica and can be access through the Carbon Dioxide Information Analysis Center.
Let’s start by taking a look at the temperature data. Use the code below to read the data set that has been downloaded and placed in the data folder for you into R as a data frame9.
9 We are using a slightly different method from before which allows us to directly download the data into our data folder. We are also using read_table2() from the readr package due to the fact that our text file is formatted using neither white space nor tabs.
# load dataset
vostok_temp <- read_table2("data/vostok_temp.txt",
skip = 60,
col_names = c("depth", "age_ice", "deuterium", "temperature"))Now we can create a plot of the temperature data over time. The age is recorded as years before present. For better visualization we will convert this to “thousand years ago” by dividing that number by 1,000.
ggplot(vostok_temp, aes(x = age_ice/1000, y = temperature)) +
geom_line(color = "blue", size = 1) +
labs(x = "thousand years before present", y = "Temperature variation [C]",
title = "Temperature variation during glacial/interglacial periods",
caption = "Data: Carbon Dioxide Information Analysis Center (CDIAC)")Plot temperature data derived from vostok ice cores.
ggplot(vostok_temp, aes(x = age_ice/1000, y = temperature)) +
geom_line(color = "blue", size = 1) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(x = "thousand years before present", y = "Temperature variation [C]",
title = "Temperature derived from Vostok ice core, Antarctica.",
caption = "Data: Carbon Dioxide Information Analysis Center (CDIAC)")And then we still need to fit a linear regression.
# fit linear regression
score_model <- lm(temperature ~ age_ice, data = vostok_temp)
# view summary of results
summary(score_model)
Call:
lm(formula = temperature ~ age_ice, data = vostok_temp)
Residuals:
Min 1Q Median 3Q Max
-4.7781 -2.3667 -0.5821 1.9830 7.7603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.631e+00 8.462e-02 -54.725 <2e-16 ***
age_ice 7.850e-07 4.987e-07 1.574 0.116
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.896 on 3309 degrees of freedom
Multiple R-squared: 0.0007483, Adjusted R-squared: 0.0004463
F-statistic: 2.478 on 1 and 3309 DF, p-value: 0.1155
Remember, when you “argue” an answer you need to state your conclusion and then support that statement.
You’re right - to determine if the currently observed rate of change is “unprecedented” or not, we need to identify past time periods with the fastest rate of change and calculate them.
You’ve already used plotly in a previous chapter to plot an interactive graph. This will make it a lot easier to identify specific time periods because as you hover over any part of the line graph the pop up will give you the data points. Previously we wrapped an entire function in the ggplotly() function. In this case, it is easier to first create an object that holds the ggplot() output and then use that as the argument for ggplotly().
To do this I will show you a little trick to plot an interactive figure:
p1 <- ggplot(vostok_temp, aes(x = age_ice/1000, y = temperature)) +
geom_line(color = "blue", size = .75) +
labs(x = "thousand years before present", y = "Temperature variation [C]",
caption = "Data: Carbon Dioxide Information Analysis Center (CDIAC)")
ggplotly(p1)Here’s what your code could look like, we’ve previously used the technique where you create and object to hold your variables to make it easier to resue the code.
# define time range
min_year_vostok <-
max_year_vostok <-
# filter data set + plot
vostok_temp_subset <- vostok_temp %>%
filter(age_ice >= min_year_vostok & age_ice <= max_year_vostok)
ggplot(vostok_temp_subset, aes(x = age_ice/1000, y = temperature)) +
geom_line(color = "blue", size = 1) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(x = "thousand years before present", y = "Temperature variation [C]",
title = "Temperature change recorded in Vostok ice core (128357 - 138193 years ago).",
caption = "Data: Carbon Dioxide Information Analysis Center (CDIAC)")
# fit linear regression
score_model <- lm(temperature ~ age_ice, data = vostok_temp_subset)
# view summary of results
summary(score_model)Now, let’s take a look at past patterns of changes in atmospheric CO2 over time.
# load dataset
vostok_ice <- read_delim("data/vostok_ice.txt", delim = "\t",
skip = 21,
col_names = c("depth", "age_ice", "age_air", "CO2"))13.5 Final Conclusions
Now it’s time to put everything together. You might want to refer back to the beginning of this section when we looked at some background information about the IPCC report, the atmospheric energy budget, formulated our central questions, thought about what data sets when can use to answer our question, and the limitations of our approach.
Well-written papers frequently end their introduction section with a paragraph that summarizes what their study is investigating and how they are investigating that (set of) questions - it forms a “bridge” between the introduction that sets up relevant background information (why is my question important?) and the methods section which is a very detailed account of how data was acquired (experimental design), processed, and analyzed.
If you read several of these paragraphs you will realize that they all contain a statement that follows a general formula along these lines:
In this study, we investigated [CENTRAL QUESTION OR HYPOTHESIS]. To do this we used [DESCRIPTION OF THE TYPE OF DATA SET GENERATED] to [METRICS THAT WERE CALCULATED].
You should always be able to make a 2-3 sentence statement summarizing what you are investigating and how you did it, it’s a good self-check that you know what you’re trying to accomplish.
10 Remember to include units!
13.6 Acknowledgments
These activities are based on the EDDIE Climate Change Module11
11 O’Reilly, C.M., D.C. Richardson, and R.D. Gougis. 15 March 2017. Project EDDIE: Climate Change. Project EDDIE Module 8, Version 1.